Lesson 3 Tidy data

… in which we explore the concept of Tidy Data and learn more advanced data wrangling techniques

3.1 Recap

  • a speedrun of lecture 1 and 2

3.2 Tidy data

3.2.1 What and why is tidy data?

Figure from https://r4ds.had.co.nz/tidy-data.html Wickham and Grolemund10

palmerpenguins::penguins

3.2.2 Make data tidy

with the tidyr package.

“Happy families are all alike; every unhappy family is unhappy in its own way”
— Leo Tolstoy (https://tidyr.tidyverse.org/articles/tidy-data.html)

Let’s make some data tidy!

  • table1
table1

3.2.3 pivot_wider

table2
table2 %>% 
  pivot_wider(names_from = type, values_from = count)
  • table2

3.2.4 separate

table3 %>% 
  separate(col = rate, into = c("cases", "population"), sep = "/")
  • table3

3.2.5 pivot_longer

table4a %>% 
  pivot_longer(-country, names_to = "year", values_to = "cases")
  • table4a
table4b %>% 
  pivot_longer(-country, names_to = "year", values_to = "population")
clean_wide_data <- function(data, values_column) {
  data %>% 
    pivot_longer(-country, names_to = "year", values_to = values_column)
}

clean4a <- table4a %>% 
  clean_wide_data("cases")
clean4b <- table4b %>% 
  clean_wide_data("population")

3.2.6 left_join

left_join(clean4a, clean4b, by = c("country", "year"))
  • table4a and table4b

3.2.7 unite

  • table5
table5 %>% 
  unite("year", century, year, sep = "") %>% 
  separate(rate, c("cases", "population")) %>% 
  mutate(
    year = parse_number(year),
    cases = parse_number(cases),
    population = parse_number(population)
  )
table5 %>% 
  unite("year", century, year, sep = "") %>% 
  separate(rate, c("cases", "population")) %>% 
  mutate(
    across(c(year, cases, population), parse_number)
  )
table5 %>% 
  unite("year", century, year, sep = "") %>% 
  separate(rate, c("cases", "population")) %>% 
  mutate(
    across(-country, parse_number)
  )

3.2.8 Another example

  • billboard
  • explicit vs implicit NAs
  • na.omit
billboard %>% 
  pivot_longer(starts_with("wk"), names_to = "week", values_to = "placement") %>% 
  mutate(week = parse_number(week))
tidy_bilboard <- billboard %>% 
  pivot_longer(starts_with("wk"),
    names_to = "week",
    values_to = "placement",
    names_prefix = "wk",
    names_transform = list(week = as.integer)
  )
plt <- tidy_bilboard %>% 
  ggplot(aes(week, placement)) +
  geom_point(aes(label = paste(artist, track))) +
  geom_line(aes(group = paste(artist, track)))

plotly::ggplotly(plt)

3.3 More shapes for data

  • omitted:
    • matrices
    • arrays

3.3.1 Lists

c(first = 1, second = 2)
 first second 
     1      2 
x <- list(first = 1, second = 2, "some text", list(1, 2), 1:5)
x
$first
[1] 1

$second
[1] 2

[[3]]
[1] "some text"

[[4]]
[[4]][[1]]
[1] 1

[[4]][[2]]
[1] 2


[[5]]
[1] 1 2 3 4 5
palmerpenguins::penguins

3.3.2 Nested data

example <- tibble(
  x = 1:3,
  y = list(
    "hello",
    TRUE,
    1:4
  )
)

example
# View(example)
nested <- palmerpenguins::penguins %>% 
  nest(data = -island)

nested
nested$data[[1]]
nested %>% 
  unnest(data)

3.4 Exercises

3.4.1 Tidy data

3.5 Resources